Python Job: Site Reliability Engineer

Job added on

Company

Sensei Labs
Canada

Location

Remote Position
(From Everywhere/No Office Location)

Job type

Full-Time

Python Job Details

We are looking for a Site Reliability Engineer to join our team and develop software systems and automated solutions related to monitoring and alerting for our SaaS application.

You will work with our Platform & Engineering teams to ensure our application has excellent monitoring, alerting and observability so that we can deliver an excellent experience to our customers.

The ideal candidate will be passionate about the large opportunity that Sensei Labs presents. This person must thrive and succeed in delivering high quality solutions in a hyper-growth environment where priorities can shift fast. If you're looking to solve challenging technical problems and create a great product for our customers, then this is the right role for you.

RESPONSIBILITIES

  • Building monitoring that alerts on symptoms rather than on outages
  • Develop monitoring, alerting, and dashboard systems that provide good visibility into the health and state of the system
  • Use Chaos Engineering principles to test what you build under real-world conditions
  • Distinguish between monitoring for resource provisioning and monitoring for adverse events needing mitigation in other ways
  • Administer production jobs
  • Understand debugging info
  • Roll back a bad software push
  • Block or rate-limiting unwanted traffic
  • Bring up additional serving capacity
  • Working closely with internal partners and teams to ensure that we ship software that meets security, SLA, and performance requirements
  • Writing, updating, and using documentation, including runbooks/playbooks
  • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
  • Debugging complex problems across an entire stack and creating solid solutions
  • Developing CI/CD processes to improve cadence

REQUIREMENTS AND SKILLS

  • Proven work experience as a Site Reliability Engineer or similar role
  • Experience with monitoring and observability such as with Datadog, Sensu, New Relic, and Nagios
  • Specific demonstrable experience in developing, monitoring & alerting systems
  • Experience debugging complex problems
  • Experience designing, building, and operating large-scale production systems
  • Knows Python, Java, Go, Rust, or similar
  • Understands networking and messaging, especially between services
  • Has hands-on experience using source control (Git, GitHub) and feature branching strategies
  • Has experience with a variety of databases
  • Experience with containers, such as with Docker or Kubernetes
  • Experience automating infrastructure, testing, and deployments and can explain the Infrastructure as Code paradigm
  • Experience with configuration management

ABOUT US:

At Sensei Labs, we're continuing to build an amazing, diverse team, and inclusive culture. Our competitive advantage is rooted in the unique perspectives and experiences of our team members. We encourage you to apply even if you don't have all the qualifications listed but want to bring new ideas and perspectives to augment our team.  

We're committed to ensuring equal access to employment opportunities for all qualified candidates, including candidates of color, women, LGBTQ+ candidates, candidates with family caregiving responsibilities, Indigenous candidates, immigrant candidates, and differently abled candidates. If you require accommodation during the application or interview process, please let us know and we’ll work with you to ensure you have a positive experience.

Job Type: Permanent

Salary: $92,447.00-$103,956.00 per year

Benefits:

  • Paid time off
  • Vision care

Schedule:

  • 8 hour shift

Experience:

  • DevOps: 1 year (preferred)

Work Location: Remote